Prediction of protein domain boundaries from inverse covariances
نویسنده
چکیده
It has been known even since relatively few structures had been solved that longer protein chains often contain multiple domains, which may fold separately and play the role of reusable functional modules found in many contexts. In many structural biology tasks, in particular structure prediction, it is of great use to be able to identify domains within the structure and analyze these regions separately. However, when using sequence data alone this task has proven exceptionally difficult, with relatively little improvement over the naive method of choosing boundaries based on size distributions of observed domains. The recent significant improvement in contact prediction provides a new source of information for domain prediction. We test several methods for using this information including a kernel smoothing-based approach and methods based on building alpha-carbon models and compare performance with a length-based predictor, a homology search method and four published sequence-based predictors: DOMCUT, DomPRO, DLP-SVM, and SCOOBY-DOmain. We show that the kernel-smoothing method is significantly better than the other ab initio predictors when both single-domain and multidomain targets are considered and is not significantly different to the homology-based method. Considering only multidomain targets the kernel-smoothing method outperforms all of the published methods except DLP-SVM. The kernel smoothing method therefore represents a potentially useful improvement to ab initio domain prediction.
منابع مشابه
An Analytical Solution for Inverse Determination of Residual Stress Field
An analytical solution is presented that reconstructs residual stress field from limited and incomplete data. The inverse problem of reconstructing residual stresses is solved using an appropriate form of the airy stress function. This function is chosen to satisfy the stress equilibrium equations together with the boundary conditions for a domain within a convex polygon. The analytical solu...
متن کاملPPRODO: prediction of protein domain boundaries using neural networks.
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of multidomain proteins but also for the experimental structure determination. Since protein sequences of multiple domains may contain much information regarding evolutionary processes such as gene-exon shuffling, this information can be detected by analyzing the ...
متن کاملDiscovering Domains Mediating Protein Interactions
Background: Protein-protein interactions do not provide any direct information regarding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting domain pairs. However they do not consider the in...
متن کاملDelineation of modular proteins: Domain boundary prediction from sequence information
The delineation of domain boundaries of a given sequence in the absence of known 3D structures or detectable sequence homology to known domains benefits many areas in protein science, such as protein engineering, protein 3D structure determination and protein structure prediction. With the exponential growth of newly determined sequences, our ability to predict domain boundaries rapidly and acc...
متن کاملApproximating prediction error covariances among additive genetic effects within animals in multiple-trait and random regression models
A method for approximating prediction error variances and covariances among estimates of individual animals genetic effects for multiple-trait and random regression models is described. These approximations are used to calculate the prediction error variances of linear functions of the terms in the model. In the multiple-trait case these are indexes of estimated breeding values, and for random ...
متن کامل